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1.5 The sequential probability ratio test. Sec. 1.1 treated tests between two laws 
P and Q on a sample space X. If we have independent, identically distributed (i.i.d.) 
observations Xi, . . . , with distribution P or Q, then X can be replaced by the Cartesian 
product X"^ of n copies of X on which one decides between two laws and Q^, so that we 
are back in the situation of deciding between two laws on a sample space. But now, decision 
procedures will be considered where the statistician, instead of having to decide between P 
and Q for a fixed n, is allowed to gather additional observations. An infinite amount of data 
would allow a correct decision to be made between any two different probability measures 
with zero error probabilities, but now it will be assumed that each observation has a cost 
c, measured on the same scale as loss functions. So if c > max(LpQ, Lqp), it would not be 
worth taking any observations; for a prior tt one should choose P if tt{Q)Lqp < tv{P)Lpq, 
choose Q if tt{Q)Lqp > 7r{P)LpQ, and make an arbitrary choice if tt{Q)Lqp = tt{P)Lpq. 
More typically, the cost c per observation is rather small relative to LpQ and Lpq, so it 
will be worth taking some observations. A decision rule will choose among three options 
after the nth observation: decide in favor of P or of Q (and stop taking observations), 
or take at least one more observation. Note that if nc > m.ax{LpQ, Lqp), more would 
be spent on n observations than could be lost by making a wrong decision, so for a good 
strategy, the probability of taking n or more observations should not be too high. Still, if 
one has already made n — 1 observations, the (n — l)c spent on them is already lost and 
it may well be that the next observation, with cost c, is worth taking. 

We will have a sequence of possible observations Xi,X2, . . . , i.i.d. with distribution 
ji = P or Q. Let / be the likelihood ratio Rq/p on X. Then < / < +oo. After n 
observations the likelihood ratio of Q'^ to P" is 

RQr,/pn{X\,... ,Xn) := rn{Xi^. . . ^Xn) := Y)Lj=if{^j) 

for n = 1, 2, Let tq = 1. 

The probability that r„ is an undefined product • oo will be under either P" or 
Q"^, since / = has probability for Q and / = oo has probability for P. For a fixed 
n, the Neyman-Pearson lemma (Sec. 1.1) tells us to choose P if is small and Q if is 
large. In the sequential probability ratio test (SPRT) to be defined, the idea is that if 
is in an intermediate range, we make no decision and continue sampling. Specifically, for 
< ^ < P < oo, the SPRT(A, B) is the decision rule which calls for taking observations 
Xi, . . . , Xn until the least n such that rn < A oi Vn > B, then to choose P if rn < ^ or Q 
fn > B (and > A, which will be true automatically except in the unusual case that 
Tn = A — B). We will usually have 0<A<l<P<oo. If74>l the test chooses P, or 
for P < 1 and ^ < 1 it chooses Q, for n = 0, that is, without taking any observations. 

In general, a sequential test (non-randomized) will be a sequence of measurable func- 
tions {(pn{.Xi, . . . ,Xn)}n>o- Each has possible values —1,0 and 1, where wc stop 
sampling at iV, the least n such that 7^ 0, and then choose P if = —1 and Q if 
0n = 1. Here N is random. We would like to choose a test so as to minimize the expec- 
tations of N under P or Q as well as the error probabilities. The total cost -|- loss will be 
Nc plus the loss LpQ or Pgp, if any. 

For any sequential test = {0n} of P vs. Q, let a{ji, 0) be the probability that 
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rejects when it is true, and let Ef^^^N be the expectation of A?" if is true, both for ji = P 
or Q. Sequential probability ratio tests have an optimality property among all sequential 
tests, as follows: 

1.5.1 Theorem (A. Wald, J. Wolfowitz, T. Ferguson, D. Burkholder, R. Wijsman). Let 
ij) = SPRT(A, B) where 0<yl<l<S<oo and let (p be any other sequential test. 
Suppose that for two laws P and Q, a{P,(f)) < a{P^il^) and a{Q^(f)) < a{Q^il^). Then 
Ep^^N < Ep^^N and Eq^^N < Eq^^N. 

In other words, if •0 is a sequential probability ratio test with a^P^if:) — a and 
a{Q, ip) = 7, then in the class of all sequential tests (f) with a{P, (f)) < a and a{Q, (j)) < 7, 
the SPRT ip minimizes Ep^^fyN, and it minimizes Eg^ipN. This optimality property is un- 
expectedly strong, since one might have thought that by allowing Ep^(f)N to be larger one 
could make EQ^(j,N smaller, or vice versa. 

Theorem 1.5.1 will be proved in Sec. 1.7. First, here are some other facts. 

1.5.2 Lemma. For any tp = SPRT(A, B) of P vs. Q with < A < B < 00, there is a 5 
with < S <1 and a C < 00 such that n'^iN > n) < C{1 - 5)"^ for = P or Q. 

Proof. If A = B then = and there is no problem. So assume A < B. Let 
Zi :— log f{Xi). Then Zi are independent, identically distributed random variables 
with —00 < Zi < +00. Let Sn '■= Zi + ■ ■ ■ + Zn, which is well-defined almost surely 
for P^ or Q^. Then N > n if and only if log A < Sj < logB for j = 1, . . . Since 
P ^ Q, there is a 6 > such that p := > b)/2 > for = P or Q. So for 

some m large enough, there is probability at least p"^ that l^^l > log{B/A). The events 
Di := {\Sim — 'S'(i_i)^| > log{B/A)} are independent for i = 1, 2, . . . , and if Di occurs 
then N < im. So the probability that > im is < (1 - p"^)* for = P or Q. Thus for 
all j = 1,2,..., Pr(A^ > j) < (1 — p'^)^^/'^\ where [x] is the largest integer < x and Pr 
= /i^, - P or Q. Thus Pr(7V > j) < (1 -p^y/^ " 1, so let S := 1 - (1 ^nd 
C := 1/(1 -p"^). □ 

It follows from Lemma 1.5.2 that Ep^^N < 00 and Eq^^N < 00. To apply sequential 
probability ratio tests it is useful to know relations between A, B and the error probabilities 
a{P, ifj) and q;(Q, ip) such as the following. 

1.5.3 Proposition. For the test ip = SPRT(A, B) of P vs. Q, with Q<A<1<B < +00, 
let CKo := q:(P, and cti := aiQ^ip). Then 

OLQ < (l-ai)/P < 1/P and ai < {I - ao)A < A. 



Proof. Let P„ be the event that N = n and r„ > B. Then the events P„ are disjoint and 

00 00 

P«(P,V) = Bj^P^'iFn) < = 1-«(<3,V')- 



The other inequality is proved symmetrically, since SPRT(^, B) for P vs. Q is equivalent 
to SPRT(1/P, 1/A) for Q vs. P. □ 
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Examples. Here is a class of examples to show that of the four inequalities in Proposition 
1.5.3, the first and third are sharp (they become equalities in special cases). Of course, 
the second inequality is an equation only for ai — and the fourth for cuq = 0. Zero error 
probabilities are possible but unusual. 

There are P and Q such that ri := Rq/p takes only two values t and where 
l<t<oo. Specifically, if p := P{ri = t), thenl = Q{X) = pt+{l-p)/t so p = l/{t + l). 

Let A = l/P and B = for some positive integers j and k. For all n, clearly 
Tn = for some z G Z := {0, ±1, ±2, . . . }. If r„ = ^ then r^+i = t^^^ . So if r^v > S 
then vn = B (recalling the definition of A^) and if r^v < A then — A. So the only 
inequality in the proof of Proposition 1.5.3 for B becomes an equation, and likewise for A. 
So CKo = (1 — Oii)/B and cti = (1 — ao)^. The two linear equations can then be solved for 
ao and ai, giving ckq = (1 - A)/{B - A) and ai = {B - 1)A/{B - A). 

In general, however, it can easily happen that tn > B ("overshoot") or tn < A 
("undershoot"). If ri is bounded above by K, then rjv < KB, and if ri is bounded below 
by £, then > sA. 

Proposition 1.5.3 gave bounds for error probabilities for SPRTs, which became equa- 
tions in the last example. In that example, it is also possible to find explicitly the average 
sample numbers E^^^N ior ji = P and Q. 

Definition. A random variable r which is a function of Xi,X2, ... is a stopping time if 
T has nonnegative integer values and for all n = 1,2, .. . there is an event A^ such that 
T < n if and only if (Xi, . . . ,Xn) G An, while for n = 0, {r = 0} is either empty or the 
whole space. 

For example, if r = for an SPRT or any sequential test (p, N < n ii and only if 
for some j < n, (j)j{Xi, . . . , Xj) 0, so N is a stopping time. Recall that "i.i.d." means 
"independent and identically distributed." 

1.5.4 Wald's identity. IfFi,F2,-- - are i.i.d., £;|Yi | < oo, := Yi + ■ ■ ■ + Yn for n > 1, 

To := 0, and Et < oo, then ET^ = ErEYi. 

Proof. The identity is clear if r = 0, so we can assume r > 1 almost surely. A mathemat- 
ical note: the following proof will apply first in the case where Yi > a.s., replacing Yi by 
\Yi\ for all i, which will justify interchanging sums, and sums with expectations, then for 
general Y"j. We have 

oo oo n 

ETr = J^P(r = n)E(ri + --- + y„|T = n) = P(t = n) £;(rj- |t = n) 

n=l n=l j=l 

where the conditional expectation is replaced by if P{t = n) = 0. Note: if Yj were 
independent of {r = n} for j < n then E{Yj\T — n) = EYj and Wald's identity would 
follow. But this independence does not hold in general. Instead, 

oo oo oo oo 

ETr = ^^i?(y,|T = n)P(r = n) = ^5]i?(y,l,=J 

j=l n=j j=l n=j 
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The event {r < j — 1} is independent of Yj, so 



ETr = 5]i?(F,)P(T>j) = ^Fi5^P(T>i). 



Now 



j=l j=l k>j k=l j=l k=l 

and the identity follows. □ 

Recall the examples just after Prop. 1.5.3, where ri = / = Rq/p only has values 1 
or t for some t > 1, and we do a test ^ = SPRT(A, B) for A = 1/P and B = for some 
positive integers j, k. Then r^r has only two possible values A, B. Take Yi := log f{Xi), 
which has possible values or ±logt, and Tjv has possible values log A or logS. Then 
P°°{rN = B) = a{P,'i/j) = ao and Q°°{rN = A) = a{Q,'ip) = ai. Recall that ckq 
and cti can be found explicitly in this case in terms of A, B. Thus we can find E^T^ and 
by Wald's identity, we can evaluate E^^^N = E^T^/E^Yi where E^^ is the expectation 
when jjL is the true probability law and fx = P or Q. 

PROBLEMS 

1. Let X = {0,1,2}, P(0) = Q(2) = 1/3, P(l) = Q(l) = 1/2, and P(2) = Q{0) = 1/6. 
For i/j = SPRT(l/4, 4) of P vs. Q, find upper bounds (as small as possible) for the error 
probabilities ao and ai. 

2. For ijj in Problem 1, evaluate the average sample numbers Ep^^N and Eq^^N. Hint: 
use Wald's identity, applied to Yi = log RQ/p{Xi). 

3. Suppose that for some K > 1, Rq/p{x) < K for all x. Show that on the event F„ in 
the proof of Proposition 1.5.3, B < < KB. Then show that ao > {1 — ai)/{KB). 
Similarly if Rq/p{x) > 1/M for all x, show that cti > (1 — ao)A/M. 

4. Pairs of patients participate in a trial of a blood pressure drug. One of each pair is 
chosen at random to get the drug while the other gets a placebo which has no effect. 
Later, the patients' blood pressures are measured. For the ith pair, let Xi be the 
measurement [average of systolic and diastolic] for the patient getting the drug minus 
that of the other member of the pair. Hypothesis P is the "null" hypothesis that the 
drug is ineffective, so P{Xi < 0) = 1/2. Hypothesis Q has Q{Xi < 0) = 0.6. Let Yi ^ 1 
if < and Yi — otherwise. Based on the Yi data, we want to find a test i/j = 
SPRT(1/S, B) of P vs. Q such that the error probabilities ao and ai are both < 0.05. 

(a) Give a Bi such that B > B^ is sufficient, using ao < 1/B and cci < ^ in 
Proposition 1.5.3. 

(b) Give a Bo such that S > So is necessary, using the previous problem. 
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5. In the example at the end of the section with t = 2, and Rq/p = t ot 1/t (not 1), 

(a) Find the smallest sample size n for which a (possibly randomized but non-sequen- 
tial) test for n observations given by the Neyman-Pearson lemma (Sec. 1.1) has 
both error probabilities < 0.05 (in other words, size < 0.05 and power > 0.95). 
Hint: if h{k,n,p) is the probability of exactly k successes in n independent trials 
with probability p of success on each trial, and E{k,n,p) the probability of k or 
more successes, then E(12, 23, 1/3) = 0.04805 while E(12, 22, 1/3) + |6(11, 22, 1/3) = 
0.05572. 

(b) Find an SPRT(A, B) for the same P and Q with both error probabilities < 0.05. 

(c) To evaluate the relative efficiency of the SPRT, find EN /n for P and for Q with 
n from (a) and N from (b), or give an upper bound for it as small as possible (we 
hope, less than 1). 

6. In problem 2 of Sec. 1.1, find an SPRT(^, B) for P vs. Q having both error probabilities 
< 0.05, with B as small as possible, S > 1, for which the inequality in Proposition 1.5.3 
becomes an equality. Then, find the actual error probabilities (which may be less than 
0.05). 

7. This relates to Wald's identity 1.5.4 and its proof. Let yi,Y2,--- be independent, 
identically distributed random variables with P(Yi = —1) = P{Yi = 3) = 1/2. Let r 
be the least j such that \Yj\ > 2. Which of the following equations are valid? Show in 
each case what the left and right sides do equal. 

(a) E{Yr\r = I) = EY^ (b) E{Yr\r = 2) = EYu 
(c) E{Y2\t = 2) = EY2, (d) E{Y^+Y2\t = 2) = 2EY^, 
(e) E{Y^\t > 1) = EFi, (f) E{Y2\t > 2) = EY2, 
(g) E{Yi+Y2\T>2)^2EYi. 



5 



